API: Define RepairManifests action interface #10784

amogh-jahagirdar · 2024-07-25T17:57:52Z

Co-authored-by: Mathew Fournier [email protected]

Refer to #10445 for more details.

This change defines a RepairManifest Action interface where implementations should produce a new set of manifests where
duplicate entries should be removed, and entries which refer to files which no longer exist on disk should also be removed.

amogh-jahagirdar · 2024-07-25T18:04:01Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+ * An action that will repair manifests. Implementations should produce a new set of manifests where
+ * duplicate entries are removed and entries which refer to files which no longer exist are removed.


The interface is prescriptive on what implementations should do in terms of duplicate entries and what implementations should do when files which no longer exist. After some thought I think it's right to define that but I can see arguments where someone wants implementations to just fail in case there are entries which refer to files which no longer exist (Note: I'm specifically talking about cases where implementations cannot recover those files). To me though that can just be an additional validation API someone adds down the line

Also I think at some level the interface needs to be prescriptive to have a clear defined Result.

So I think we may be missing a few other things here and I'm not sure if they should all happen on every invocation. In my mind

Read metadata from data files and recompute values in manifests

Remove Manifests or files with FNF

Removing duplicates

I think we probably also want to be able to do a dry run of this? Something that checks these 3 things but doesn't actually commit any changes?

NVM (dry run is in the api below)

Thanks @amogh-jahagirdar for this. Yea Russell's list sounds good, optionally, I guess we can put the options in later when we implement them.

@RussellSpitzer @szehon-ho I largely wanted to avoid a user having to think through which options they need to specify, and make sure implementations have sane defaults but I think I agree at least at the actions API level we could have separate configuraiton methods. When the procedure implementation is there, users could pass in what they want and the procedure implementation has some sane defaults (power users who use the raw actions APIs would have to set every option they want explicitly). cc @danielcweeks in case he had any thoughts.

amogh-jahagirdar · 2024-07-25T18:07:29Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+    Iterable<ManifestFile> addedManifests();
+
+    /** Returns the number of duplicate files removed */
+    long duplicateFilesRemoved();


Would it be more useful if this was an Iterable of the actual duplicate files? same with the missing files below?

TBH, it didn't seem useful to me because someone who is running repair probably wants to know if the table was corrected and to which extent was it corrected, rather than the individual files....but lmk what you think @szehon-ho @RussellSpitzer @danielcweeks

I kind of feel like we do want the iterable here but the procedure could narrow this to the count in the result there

I dont have a strong preference either way, but if we dont expect millions, it makes sense to have iterable?

I agree, I think we can just put the Iterable here. Hopefully in the average repair case, it shouldn't really be millions of duplicate files?

nastra · 2024-07-26T11:58:35Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+   *
+   * @return this for method chaining
+   */
+  RepairManifests dryRun();


should this be rather part of the actual Spark procedure rather than in the API? As I'm not sure how one would examine the results of a dry run in this case. For example, RemoveOrphanFilesProcedure has a dry run option part of the procedure, but not as part of the DeleteOrphanFiles API

Shouldn't this be part of "apply()" which should already exist because this inherits PendingUpdate?

Shouldn't this be part of "apply()" which should already exist because this inherits PendingUpdate?

@RussellSpitzer this inherits SnapshotUpdate in the Actions API, there's no apply() in that.

I think you're thinking of the other SnapshotUpdate that's part of the "normal" API. This brings up a good point of should this also have an interface there or not. RewriteManifests is defined in the regular API and then there's also an action for that.

The purpose of the Actions API is to enable engines to provide distributed/other custom implementations of procedures, and I think here we could just start by having it as an action and if there's a desire to add it to the typical API we could do it at that time?

should this be rather part of the actual Spark procedure rather than in the API? As I'm not sure how one would examine the results of a dry run in this case. For example, RemoveOrphanFilesProcedure has a dry run option part of the procedure, but not as part of the DeleteOrphanFiles API

Good point, I missed that this is what orphan file removal does. If I think through it a bit more: by defining dryRun() at the interface level, what we're saying is implementations must have a way of supporting that (the implementation could ignore the option technically but that would be a bad implementation of the interface).

If we look at what orphan file does, the action interface has a "deleteWith" API and the procedure will pass in a no-op deletion function.

What we could do is have an action API "commitRepairWith()" which will accept some sort of commit function to perform the manifest rewrite or if the procedure is a dry-run, some no-op commit.
It feels like it complicates the API more to do that.

I think the real fix is to return Result as part of the dryRun...let me double check if this is feasible.

Actually, I think it's good as it is @nastra. If a user wants to get the results, in the current API they'd do:

RepairManifests.Result result = actionsProvider.repairManifests(table).repairFileEntries().repairEntryStats().dryRun().execute(); // Do whatever with result

nastra · 2024-07-26T11:59:03Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+ * An action that will repair manifests. Implementations should produce a new set of manifests where
+ * duplicate entries are removed and entries which refer to files which no longer exist are removed.
+ */
+public interface RepairManifests extends SnapshotUpdate<RepairManifests, RepairManifests.Result> {


should this maybe extend Action like the other actions?

I think SnapshotUpdate is the correct one to extend, here's my thinking:

1.) SnapshotUpdate already extends Action
2.) I think implementations necessarily will produce snapshots when they perform repairs (putting aside the dry run API option). The new snapshot that would be produced, at least in my mind in the reference implementation would be essentially a "rewrite manifests"-like snapshot.

ah nvm, this seems to be correct. I thought it's extending org.apache.iceberg.SnapshotUpdate while it's actually extending org.apache.iceberg.actions.SnapshotUpdate

szehon-ho

Thanks @amogh-jahagirdar for pushing this forward!

szehon-ho · 2024-07-29T17:41:53Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+ * An action that will repair manifests. Implementations should produce a new set of manifests where
+ * duplicate entries are removed and entries which refer to files which no longer exist are removed.


Thanks @amogh-jahagirdar for this. Yea Russell's list sounds good, optionally, I guess we can put the options in later when we implement them.

szehon-ho · 2024-07-29T17:42:29Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+    Iterable<ManifestFile> addedManifests();
+
+    /** Returns the number of duplicate files removed */
+    long duplicateFilesRemoved();


I dont have a strong preference either way, but if we dont expect millions, it makes sense to have iterable?

amogh-jahagirdar · 2024-08-01T18:23:21Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+   *
+   * @return this for method chaining
+   */
+  RepairManifests dryRun();


Actually, I think it's good as it is @nastra. If a user wants to get the results, in the current API they'd do:

RepairManifests.Result result = actionsProvider.repairManifests(table).repairFileEntries().repairEntryStats().dryRun().execute(); // Do whatever with result

amogh-jahagirdar · 2024-08-01T18:30:25Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+  RepairManifests repairEntryStats();
+
+  /**
+   * Configuration method for removing duplicate file entries and removing files which no longer
+   * exist in storage
+   */
+  RepairManifests repairFileEntries();


Based on the previous discussion @szehon-ho @RussellSpitzer the API now has 2 configuration methods for repairing file entries and reparing entry stats. The first one encompasses both the duplicate + missing case (I don't think there's a need to differentiate those options)

Sounds good, for completness I wonder if we need to dinstinguish what to repair.. ie, file_size_in_bytes vs metrics ,etc. Can just read the filesystem metadata and save from reading the footer, for example. Probably it is too fine grained, and this option is fine?

I feel like if we're going through the process of hitting the file (to check size or read footer), we should just repair everything. I can't see a case where you would want to rewrite the information but possibly carryover incorrect information.

szehon-ho · 2024-08-01T18:52:53Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+  RepairManifests repairEntryStats();
+
+  /**
+   * Configuration method for removing duplicate file entries and removing files which no longer


szehon-ho · 2024-08-01T18:53:46Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+  RepairManifests repairEntryStats();
+
+  /**
+   * Configuration method for removing duplicate file entries and removing files which no longer
+   * exist in storage
+   */
+  RepairManifests repairFileEntries();


Sounds good, for completness I wonder if we need to dinstinguish what to repair.. ie, file_size_in_bytes vs metrics ,etc. Can just read the filesystem metadata and save from reading the footer, for example. Probably it is too fine grained, and this option is fine?

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

danielcweeks · 2024-08-01T20:13:07Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+   */
+  RepairManifests dryRun();
+
+  interface Result {


The one thing I feel is missing here is stats on how many entires were fixed for stats. Do we just assume everything was corrected? It's unclear from a dry run result if running the operation would result in any change if there are no duplicate/missing/recovered files, but you do need to run for missing stats.

Unfortunately, I would expect that fixed stats could be a very large number of files, which wouldn't work with the iterable model for the other categories of issues.

Good point, I don't think we should assume everything was corrected since it involves a footer read, some computation, and in all that flow things can go wrong in that for an individual file but not for another.

I've added two result methods for this:

/** Returns the number of manifest entries for which stats were incorrect */ long entryStatsIncorrect(); /** Returns the number of manifest entries for which stats were corrected */ long entryStatsRepairedCount();

I think it's important to distinguish between how many were repaired and how many were incorrect to begin with. WIth only the repair count, if it's 0 a user wouldn't be able to distinguish "was everything actually OK with the stats" vs "Hey a bunch of entry stats are incorrect but we couldn't actually do the repair".

I've also removed "added manifests", because as I was working through the implementation a bit more, I came to the conclusion that it's not needed. I think it was just a copy from the rewrite manifests results but
my thinking is that for repairing we're always working with some existing manifests and producing new manifests with rewritten contents but the "source" is always some existing manifest for repair. For repair there isn't really an "added" manifest in the sense of how the rewrite procedure defines it. cc @danielcweeks @szehon-ho @RussellSpitzer . It's also something we could add later on with a default implementation if we realize we need it.

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

amogh-jahagirdar · 2024-08-02T18:52:09Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+   */
+  RepairManifests dryRun();
+
+  interface Result {


Good point, I don't think we should assume everything was corrected since it involves a footer read, some computation, and in all that flow things can go wrong in that for an individual file but not for another.

I've added two result methods for this:

/** Returns the number of manifest entries for which stats were incorrect */ long entryStatsIncorrect(); /** Returns the number of manifest entries for which stats were corrected */ long entryStatsRepairedCount();

I think it's important to distinguish between how many were repaired and how many were incorrect to begin with. WIth only the repair count, if it's 0 a user wouldn't be able to distinguish "was everything actually OK with the stats" vs "Hey a bunch of entry stats are incorrect but we couldn't actually do the repair".

amogh-jahagirdar · 2024-08-02T19:07:01Z

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java

+   */
+  RepairManifests dryRun();
+
+  interface Result {


I've also removed "added manifests", because as I was working through the implementation a bit more, I came to the conclusion that it's not needed. I think it was just a copy from the rewrite manifests results but
my thinking is that for repairing we're always working with some existing manifests and producing new manifests with rewritten contents but the "source" is always some existing manifest for repair. For repair there isn't really an "added" manifest in the sense of how the rewrite procedure defines it. cc @danielcweeks @szehon-ho @RussellSpitzer . It's also something we could add later on with a default implementation if we realize we need it.

Co-authored-by: Mathew Fournier <[email protected]>

szehon-ho · 2024-08-06T20:56:31Z

Hi, @amogh-jahagirdar @RussellSpitzer had a question. I think one motivation was to handle CheckSnapshotIntegrity API proposed in #10642

However in that pr, it is also checking metadata.jsons and manifest-list. for missing metadata-file locations.

I think there are other cases it may be worth to repair, for example snapshot summary metadata can be repaired if they are ever out of sync (as the values are mostly updated incrementally and one bad update can cascade for subsequent snapshots).

What do you guys think about a generic RepairTable, and have some analgous methods repairSnapshotSummaries or repairSnapshots ?

Otherwise we may still need #10642 to do those checks

szehon-ho · 2024-08-07T00:03:10Z

Another one is : #10535 corrupt manifest file. It may make sense to repair those as well. So in summary:

repairSnapshotMetadata (read snapshot file and remove or repair snapshot metadata)
repairManifestMetadata (read manifest file and remove or repair manifest metadata)
repairFileMetadata (read data/delete file and remove or repair file metadata)

Let me know what you think, if its within the scope here

amogh-jahagirdar · 2024-08-08T04:39:42Z

@szehon-ho I think I agree that the use cases presented are legitimate repair cases however I'm not sure that we should be combining all of them into a single RepairTable procedure upfront. I understand the appeal but my thinking is that the implementation gets complex. My thinking is that users will typically hit some issue, like missing file referenced in their manifest or hit duplicate files and then just want to solve that issue with a focused procedure. Or in the other missing metadata file/missing manifest list/manifest/corrupted snapshot case run a separate procedure.

That said one thing to consider is do we need these procedures to move in lockstep somehow to ensure that statistics/snapshot summaries roll up correctly? I don't think that's the case since in most cases we are producing new snapshots entirely after the repair and the deltas in rows/files added/deleted should be accumulated correctly? We'll probably need tests for that.

I do think that we could add a RepairTable down the line which combines multiple smaller repair actions so that the implementation of RepairTable itself isn't super complex. It comes at the cost of having more in the action API surface area but I feel like that's OK?

More concretely at least what I'm thinking so far:

RepairManifests which is focused on looking at the table's current manifests and removing duplicates/attempting recovery of missing files or deleting those entries if those files cannot be recovered
RepairTableMetadata which would do the historical metadata file recovery, manifest list repairs and snapshots?

@szehon-ho I actually had a question on the snapshot repair, based on the description the goal of that is to repair snapshot summary stats which may have been corrupted. Doesn't that necessarily mean we must mutate the existing snapshot (and subsequent snapshots) to correct it?

Overall though I'll def give the general RepairTable more thought, just feels a bit complex to put everything in one go up front rather than have focused actions.

szehon-ho · 2024-08-08T06:40:24Z

Makes sense @amogh-jahagirdar !

@szehon-ho I actually had a question on the snapshot repair, based on the description the goal of that is to repair snapshot summary stats which may have been corrupted. Doesn't that necessarily mean we must mutate the existing snapshot (and subsequent snapshots) to correct it?

i was just thinking we make a new snapshot with correct summary stats (only the totals). But yes you are right, it is open to interpretation, in this case you cant go back and fix wrong stats, so this particular feature probably does need more thought.

amogh-jahagirdar · 2024-08-29T05:16:42Z

Makes sense @amogh-jahagirdar !

@szehon-ho I actually had a question on the snapshot repair, based on the description the goal of that is to repair snapshot summary stats which may have been corrupted. Doesn't that necessarily mean we must mutate the existing snapshot (and subsequent snapshots) to correct it?

i was just thinking we make a new snapshot with correct summary stats (only the totals). But yes you are right, it is open to interpretation, in this case you cant go back and fix wrong stats, so this particular feature probably does need more thought.

Really sorry for the delayed response @szehon-ho I forgot I had this open. So I was discussing this with @rdblue and I am more convinced of your point that we may as well have a unified RepairTable action with different configuration methods. The compelling arguments for me at least is that it follows the existing API patterns, for example ManageSnapshots has different options but serves as a useful entry point. RepairTable is a useful entry point for a user, there can be sane defaults on what to repair, and power users can specify which specific operations they want to run if they know what's broken. Another example of this came up in https://github.com/apache/iceberg/pull/10755/files#r1696094932 for removing unused specs where we were thinking of having an entry point maintenance API for both removing unused specs and schemas. So from an API consistency perspective, I think it's good to have the same pattern for this action.

Furthermore, the procedure is traversing the entire metadata tree so practically there's probably overlap across the different repair operations.

I can update the PR based on this

github-actions · 2024-11-10T00:16:13Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions · 2024-11-17T00:17:05Z

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

wypoon · 2025-01-30T02:00:46Z

For my edification, can someone please explain how duplicate file entries in manifests can arise? Can two entries for the same file occur in a single manifest? Can even two manifests be in the same manifest list if they overlap (have an entry for the same file in common)? I'd have thought both of these situations would be bugs. Or are there actual sequences of operations that lead to such outcomes, similar to how dangling deletes can occur?
Basically, I'm trying to understand what repairs might be needed due to operations in Iceberg (and which of these are bugs), and what are needed only due to users doing something outside of Iceberg. Deleting a file from storage falls into the latter category.
Also, I understand that there was an old bug where data file size was written incorrectly and this actually caused reads to fail, and this is the motivation for correcting the statistics in metadata. However, that bug was long fixed, so I wonder if there are still known situations where these statistics need to be corrected.

amogh-jahagirdar · 2025-01-30T02:45:12Z

For my edification, can someone please explain how duplicate file entries in manifests can arise? Can two entries for the same file occur in a single manifest? Can even two manifests be in the same manifest list if they overlap (have an entry for the same file in common)? I'd have thought both of these situations would be bugs. Or are there actual sequences of operations that lead to such outcomes, similar to how dangling deletes can occur?

Sure, so one example of an issue that happened in the past is that in the Kafka Connect integration we ended up appending the same file multiple times. We rectified that in Kafka connect and the Iceberg library for duplicate appends in the same snapshot, but it's still technically possible to append the same file across different snapshots (at least in the reference implementation and probably a few others).

Detecting overlapping files involves an expensive read through current manifest(s) to deduplicate which if performed on every append would be prohibitively expensive for the operation.

Overlapping files across manifests does imply a bug in whatever integration is writing to the table. However, even after those bugs are fixed, it still makes sense for Iceberg to expose repair procedures to correct those tables to unblock users from using their tables.
More generally, imo it makes sense to offer a general RepairTable procedure with different options which enable users to be able to correct their tables as best as possible, in the case a bad implementation ended up writing to them.

Can even two manifests be in the same manifest list if they overlap (have an entry for the same file in common)?
That'd also be a bug. I haven't seen any issues on that but I wouldn't be opposed to the repair table procedure handling that (probably not the biggest concern imo). It's relatively easy to detect and dedupe that since it's high in the metadata tree.

Also, I understand that there was an old bug where data file size was written incorrectly and this actually caused reads to fail, and this is the motivation for correcting the statistics in metadata. However, that bug was long fixed, so I wonder if there are still known situations where these statistics need to be corrected.

Yeah I think the same I mentioned above applies where some random implementation may end up writing incorrect statistics for whatever reason, and it'd be good for repair table to correct that since stats are something that can be deterministically corrected.

wypoon · 2025-01-30T05:48:41Z

Another question I have about duplicate entries (in the manifests) is: do their presence make the table unreadable? Or is the table still readable and it is a valid state, although undesirable (as is the case with dangling deletes, for example)?

An observation is that the existing subinterfaces of org.apache.iceberg.actions.SnapshotUpdate are all actions that do not change the state of the table (the data remains the same). This is not the case with removing data (or delete) files from the metadata because those files are missing from the storage. In that case, the state of the table is changed; either data is deleted or added (due to "undeletion" from removing delete files) or both. For this reason, I think that removing files from metadata is logically different from "repair" operations such as deduplication of entries or correcting statistics for entries.

flyrain · 2025-02-22T02:53:36Z

Do we still pursue this?

danielcweeks · 2025-02-22T04:06:12Z

Do we still pursue this?

Yes, @amogh-jahagirdar is still looking at this and we continue to see use cases where this is necessary.

flyrain · 2025-02-24T19:58:20Z

@danielcweeks thanks for the confirmation. One of use cases is to check the integrity of a copied table from different places. The use cases are increasing as rewriteTablePath was out with the release 1.8. People will need a tool to validate before they feel confident that the migration is done.

github-actions · 2025-03-27T00:15:56Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions · 2025-04-03T00:15:59Z

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

mdub · 2025-07-04T23:17:17Z

I see this PR has been closed, which is sad. Should it be re-opened, @amogh-jahagirdar, @danielcweeks?

I'd love to see repairManifests integrated, as it was a life-saver when we experienced corruption in our Iceberg tables, earlier this year.

scottedwards2000 · 2025-07-28T00:48:02Z

@danielcweeks thanks for the confirmation. One of use cases is to check the integrity of a copied table from different places. The use cases are increasing as rewriteTablePath was out with the release 1.8. People will need a tool to validate before they feel confident that the migration is done.

As an enthusiastic user of the new rewriteTablePath, I would find this use case compelling. I didn't actually realize that was within the scope of this PR since the name implies a focus on "repair". Any ETA on when we could use it for this migration confirmation?

mrcnc · 2025-08-06T20:56:08Z

Another question I have about duplicate entries (in the manifests) is: do their presence make the table unreadable? Or is the table still readable and it is a valid state, although undesirable (as is the case with dangling deletes, for example)?

I believe this depends on the reader. According to these docs it appears Snowflake will not read tables that have duplicates in the manifest files

amogh-jahagirdar · 2025-08-07T19:47:05Z

Ok this has been open for quite some time, and I haven't been able to follow through with this due to bandwidth but now I'm going to make time to follow through on this since I've received multiple reach outs on the usefulness of this procedure. From what I recall the pending decision and update had to do with making this a generic RepairTable Action API and exposing specific configuration options for selecting which operations to run.

I will update this week when I get a chance, in the mean time for folks who have table with duplicate manifests can build and run from https://github.com/amogh-jahagirdar/iceberg/tree/repair-manifests-action

amogh-jahagirdar requested a review from szehon-ho July 25, 2024 17:57

github-actions bot added the API label Jul 25, 2024

amogh-jahagirdar requested review from Fokko, RussellSpitzer, aokolnychyi, danielcweeks and nastra July 25, 2024 17:58

amogh-jahagirdar commented Jul 25, 2024

View reviewed changes

nastra reviewed Jul 26, 2024

View reviewed changes

szehon-ho reviewed Jul 29, 2024

View reviewed changes

amogh-jahagirdar force-pushed the repair-manifests-api branch 2 times, most recently from 4b2ddfe to 83ca336 Compare August 1, 2024 18:20

amogh-jahagirdar requested review from nastra and szehon-ho August 1, 2024 18:30

amogh-jahagirdar commented Aug 1, 2024

View reviewed changes

szehon-ho reviewed Aug 1, 2024

View reviewed changes

danielcweeks reviewed Aug 1, 2024

View reviewed changes

api/src/main/java/org/apache/iceberg/actions/RepairManifests.java Outdated Show resolved Hide resolved

danielcweeks reviewed Aug 1, 2024

View reviewed changes

amogh-jahagirdar force-pushed the repair-manifests-api branch from 83ca336 to 8bd7c58 Compare August 2, 2024 18:57

amogh-jahagirdar commented Aug 2, 2024

View reviewed changes

amogh-jahagirdar requested review from danielcweeks and szehon-ho August 2, 2024 19:11

amogh-jahagirdar force-pushed the repair-manifests-api branch from 8bd7c58 to cb297db Compare August 2, 2024 22:45

API: Define RepairManifests action interface

eb2041b

Co-authored-by: Mathew Fournier <[email protected]>

amogh-jahagirdar force-pushed the repair-manifests-api branch from cb297db to eb2041b Compare August 2, 2024 22:45

amogh-jahagirdar requested a review from rdblue August 8, 2024 04:51

github-actions bot added the stale label Nov 10, 2024

github-actions bot closed this Nov 17, 2024

danielcweeks reopened this Jan 28, 2025

github-actions bot removed the stale label Jan 29, 2025

github-actions bot added the stale label Mar 27, 2025

github-actions bot closed this Apr 3, 2025

danielcweeks reopened this Jul 7, 2025

danielcweeks added not-stale and removed stale labels Jul 7, 2025

dramaticlly mentioned this pull request Sep 4, 2025

rewrite_table_path does not update sizes of rewritten manifests in manifest lists #13719

Open

3 tasks

geruh mentioned this pull request Oct 16, 2025

API: Define RepairManifests action interface #14341

Closed

		* An action that will repair manifests. Implementations should produce a new set of manifests where
		* duplicate entries are removed and entries which refer to files which no longer exist are removed.

API: Define RepairManifests action interface #10784

Are you sure you want to change the base?

API: Define RepairManifests action interface #10784

Conversation

amogh-jahagirdar commented Jul 25, 2024

Uh oh!

amogh-jahagirdar Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nastra Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar Aug 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar Aug 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Aug 6, 2024

Uh oh!

szehon-ho commented Aug 7, 2024

amogh-jahagirdar Jul 25, 2024 •

edited

Loading

nastra Aug 6, 2024 •

edited

Loading

amogh-jahagirdar Aug 2, 2024 •

edited

Loading

amogh-jahagirdar Aug 2, 2024 •

edited

Loading

amogh-jahagirdar commented Jan 30, 2025 •

edited

Loading